.. image:: images/logo.png

-------------------------------------

DSZI models
'''''''''''

What are DSZI models?
=====================

DSZI is an acronym for "Defective Subpopulation Zero Inflated". It is a combination of the Defective Subpopulation (DS) model and the Zero Inflated (ZI) model.

A defective subpopulation model is where the CDF does not reach 1 during the period of observation.
This is caused when a portion of the population fails (known as the defective subpopulation) but the remainder of the population does not fail (and is right censored) by the end of the observation period.

A zero inflated model is where the CDF starts above 0 at the start of the observation period.
This is caused by many "dead-on-arrival" items from the population, represented by failure times of 0.
This is not the same as left censored data since left censored is when the failures occurred between 0 and the observation time.
In the zero inflated model, the observation time is considered to start at 0 so the failure times are 0.

In a DSZI model, the CDF (which normally goes from 0 to 1) goes from above 0 to below 1, as shown in the image below.
In this image the scale of the PDF and CDF are normalized so they can both be viewed together. In reality the CDF is much larger than the PDF.

.. image:: images/DSZI_explained.png

A DSZI model may be applied to any distribution (Weibull, Normal, Lognormal, etc.) using the transformations explained in the next section.
The plot below shows how a Weibull distribution can become a DS_Weibull, ZI_Weibull and DSZI_Weibull.
Note that the PDF of the DS, ZI, and DSZI models appears smaller than that of the original Weibull model since the area under the PDF is no longer 1.
This is because the CDF does not range from 0 to 1.

.. image:: images/DSZI_combined.png

Equations of DSZI models
========================

A DSZI Model adds a minor modification to the PDF and CDF of any standard distribution (referred to here as the "base distribution") to transform it into a DSZI Model. The transformations are as follows:

:math:`PDF_{DSZI} = PDF_{base} × (DS-ZI)` 

:math:`CDF_{DSZI} = CDF_{base} × (DS-ZI) + ZI` 

In the above equations the base distribution (represented by :math:`PDF_{base}` and :math:`CDF_{base}`) is transformed using the parameters DS and ZI.
DS is the maximum of the CDF which represents the fraction of the total population that is defective (the defective subpopulation).
ZI is the minimum of the CDF which represents the fraction of the total population that failed at t=0 or equivalently were “dead-on-arrival” (the zero inflated fraction).
To create only a DS model we can set ZI as 0. To create only a ZI model we can set DS as 1. The parameters DS and ZI must be between 0 and 1, and DS must be greater than ZI.
The above equations can be expanded depending on the equation of the base distribution. For example, if the base distribution is a two parameter Weibull distribution, the DSZI model would be:

:math:`\text{PDF:} \hspace{11mm} f(t) = \frac{\beta}{\alpha}\left(\frac{t}{\alpha}\right)^{(\beta-1)}{\rm e}^{-(\frac{t}{\alpha })^ \beta } \left(DS - ZI \right)` 

:math:`\text{CDF:} \hspace{10mm} F(t) = \left(1 - {\rm e}^{-(\frac{t}{\alpha })^ \beta }\right) \left(DS - ZI \right) + ZI`

The SF, HF and CHF can be obtained using transformations from the CDF and PDF using the `relationships between the five functions <https://reliability.readthedocs.io/en/latest/Equations%20of%20supported%20distributions.html#relationships-between-the-five-functions>`_.


Creating a DSZI model
=====================

Within reliability, the DSZI Model is available within the Distributions module. The input requires the base distribution to be specified using a distribution object and the DS and ZI parameters to be specified if required.
DS defaults to 1 and ZI defaults to 0. The output API matches the API for the standard distributions.

.. admonition:: API Reference

   For inputs and outputs see the `API reference <https://reliability.readthedocs.io/en/latest/API/Distributions/DSZI_Model.html>`_.

Example 1
---------

In this first example, we will create a Gamma DSZI model and plot the 5 functions.

.. code:: python

    from reliability.Distributions import Gamma_Distribution, DSZI_Model
    model = DSZI_Model(distribution = Gamma_Distribution(alpha=50,beta=2), DS= 0.8, ZI=0.3)
    model.plot()

.. image:: images/DSZI_example1.png

Example 2
---------

In this second example, we will create a Lognormal_DS model, draw some random samples and plot those samples on the survival function plot.

.. code:: python

    from reliability.Distributions import Lognormal_Distribution, DSZI_Model
    from reliability.Probability_plotting import plot_points
    import matplotlib.pyplot as plt
    model = DSZI_Model(distribution = Lognormal_Distribution(mu=2,sigma=0.5), DS= 0.75)
    failures, right_censored = model.random_samples(50,seed=7, right_censored_time = 50)
    model.SF()
    plot_points(failures = failures, right_censored = right_censored, func="SF")
    plt.show()

.. image:: images/DSZI_example2.png

Note that in the above example, the random_samples function returns failures and right_censored values. This differs from all other Distributions which only return failures.
The reason for returning failures and right_censored data is that is is essential to have right_censored data in order to have a DS Model.

Fitting a DSZI model
====================

.. admonition:: API Reference

   For inputs and outputs see the API reference for `Fit_Weibull_DS <https://reliability.readthedocs.io/en/latest/API/Fitters/Fit_Weibull_DS.html>`_, `Fit_Weibull_ZI <https://reliability.readthedocs.io/en/latest/API/Fitters/Fit_Weibull_ZI.html>`_, and `Fit_Weibull_DSZI <https://reliability.readthedocs.io/en/latest/API/Fitters/Fit_Weibull_DSZI.html>`_.

As we saw above, the DSZI_Model can be either DS, ZI, or DSZI depending on the values of the DS and ZI parameters.
Within the Fitters module, three functions are offered, one of each of these cases with the Weibull_2P distribution as the base distribution.
The three Fitters available are Fit_Weibull_DS, Fit_Weibull_ZI, and Fit_Weibull_DSZI.
If your data contains zeros then only the Fit_Weibull_ZI and Fit_Weibull_DSZI fitters are appropriate. Using anything else will cause the zeros to be automatically removed and a warning to be printed.
Fit_Weibull_ZI does not mandate that the failures contain zeros, but if failures does not contain zeros then ZI will be 0 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P.
Fit_Weibull_DS does not mandate that right_censored data is provided, but if right_censored data is not provided then DS will be 1 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P.
Fit_Weibull_DSZI does not mandate that failures contain zeros or that right_censored data is provided. If right_censored data is not provided then DS will be 1. If failures does not contain zeros then ZI will be 0. If failures does not contain zeros and no right censored data is provided then DS will be 1, ZI will be 0 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P.

Example 3
---------

In this example, we will create 70 samples of failure data from a Weibull Distribution, and append 30 zeros to it. We will then use Fit_Weibull_ZI to model the data.

.. code:: python

    from reliability.Distributions import Weibull_Distribution
    from reliability.Fitters import Fit_Weibull_ZI
    from reliability.Probability_plotting import plot_points
    import numpy as np
    import matplotlib.pyplot as plt
    
    data = Weibull_Distribution(alpha=200, beta=5).random_samples(70, seed=1)
    zeros = np.zeros(30)
    failures = np.hstack([zeros, data])
    plt.subplot(121)
    fit = Fit_Weibull_ZI(failures=failures)
    plt.subplot(122)
    fit.distribution.CDF()
    plot_points(failures=failures)
    plt.tight_layout()
    plt.show()

    '''
    Results from Fit_Weibull_ZI (95% CI):
    Analysis method: Maximum Likelihood Estimation (MLE)
    Optimizer: TNC
    Failures / Right censored: 100/0 (0% right censored) 
    
    Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
        Alpha         192.931         5.33803   182.747   203.682
         Beta         4.53177        0.431272   3.76064   5.46102
           ZI             0.3       0.0458258  0.218403  0.396613 
    
    Goodness of fit    Value
     Log-likelihood -426.504
               AICc  859.259
                BIC  866.824
                 AD  5.88831 
    '''

.. image:: images/DSZI_example3.png

We can see above how the fitter correctly identified that the distribution was 30% zero inflated, and it did a reasonable job of finding the alpha and beta parameters of the base distribution.

Example 4
---------

In this example, we will use Fit_Weibull_DS to model some data that is heavily right censored. The DS=0.4 parameter means that only 40% of the data is failure data, with the rest being right censored.
The original distribution is overlayed in the plot for comparison of the goodness of fit.

.. code:: python

    from reliability.Distributions import DSZI_Model, Weibull_Distribution
    from reliability.Fitters import Fit_Weibull_DS
    import matplotlib.pyplot as plt
    from reliability.Probability_plotting import plot_points
    
    model = DSZI_Model(distribution=Weibull_Distribution(alpha=70, beta=2.5), DS=0.4)
    failures, right_censored = model.random_samples(100, right_censored_time=120, seed=3)
    model.CDF(label="true model", xmax=300)
    fit_DS = Fit_Weibull_DS(failures=failures, right_censored=right_censored, show_probability_plot=False)
    fit_DS.distribution.CDF(label="fitted Weibull_DS", xmax=300)
    plot_points(failures=failures, right_censored=right_censored)
    plt.legend()
    plt.show()

    '''
    Results from Fit_Weibull_DS (95% CI):
    Analysis method: Maximum Likelihood Estimation (MLE)
    Optimizer: TNC
    Failures / Right censored: 41/59 (59% right censored)

    Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
        Alpha         67.9275         4.61424   59.4599   77.6009
         Beta         2.63207        0.357826    2.0164   3.43571
           DS        0.414739       0.0500682  0.321106  0.514964 
    
    Goodness of fit    Value
     Log-likelihood -254.236
               AICc  514.721
                BIC  522.287
                 AD  374.746     
    '''

.. image:: images/DSZI_example4.png

Example 5
---------

In this example, we will use some real world data from a vehicle manufacturer, which is available in the Datasets module.
This example shows how the Weibull_2P model can be an inappropriate choice for a dataset that is heavily right censored.
In addition the the visual proof provided by the probability plot (left) and the CDF (right), we can see the goodness of fit criterion indicate that Weibull_DS was much better (closer to zero) than Weibull_2P.

.. code:: python
    
    from reliability.Fitters import Fit_Weibull_DS, Fit_Weibull_2P
    import matplotlib.pyplot as plt
    from reliability.Probability_plotting import plot_points
    from reliability.Datasets import defective_sample
    
    failures = defective_sample().failures
    right_censored = defective_sample().right_censored
    
    plt.subplot(121)
    fit_DS = Fit_Weibull_DS(failures=failures, right_censored=right_censored)
    print('-------------------------------------------')
    fit_2P = Fit_Weibull_2P(failures=failures, right_censored=right_censored)
    
    plt.subplot(122)
    fit_DS.distribution.CDF(label="fitted Weibull_DS",xmax=1000)
    fit_2P.distribution.CDF(label="fitted Weibull_2P",xmax=1000)
    plot_points(failures=failures, right_censored=right_censored)
    plt.ylim(0,0.25)
    plt.legend()
    plt.title('Cumulative Distribution Function')
    plt.suptitle('Comparison of Weibull_2P with Weibull_DS')
    plt.gcf().set_size_inches(12,6)
    plt.tight_layout()
    plt.show()

    '''
    Results from Fit_Weibull_DS (95% CI):
    Analysis method: Maximum Likelihood Estimation (MLE)
    Optimizer: TNC
    Failures / Right censored: 1350/12295 (90.10627% right censored) 
    
    Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
        Alpha         170.983         4.61716   162.169   180.276
         Beta         1.30109       0.0297713   1.24403   1.36077
           DS         0.12482      0.00333709  0.118425  0.131509 
    
    Goodness of fit    Value
     Log-likelihood -11977.7
               AICc  23961.3
                BIC  23983.9
                 AD  27212.4 
    
    -------------------------------------------
    Results from Fit_Weibull_2P (95% CI):
    Analysis method: Maximum Likelihood Estimation (MLE)
    Optimizer: TNC
    Failures / Right censored: 1350/12295 (90.10627% right censored) 
    
    Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
        Alpha         10001.5         883.952    8410.7   11893.1
         Beta        0.677348        0.016663  0.645463  0.710807 
    
    Goodness of fit    Value
     Log-likelihood -12273.2
               AICc  24550.3
                BIC  24565.4
                 AD    27213 
    '''

.. image:: images/DSZI_example5.png

Example 6
---------

In this example we will create a DSZI model with DS=0.7 and ZI=0.2.
Based on these parameters, we expect the random samples to be around 70% failures and of those failures 20% of the total samples (failures + right censored) should be zeros due to the zero inflated fraction.
We draw the random samples from the model and then fit a Weibull_DSZI model to the data.
The result is surprisingly accurate showing DS=0.700005 and ZI=0.22, with the alpha and beta parameters closely resembling the parameters of the input Weibull Distribution.
The plot below shows the CDF on the Weibull probability plot (left) and on linear axes (right) which each provide a different perspective of how the distribution models the failure points.

.. code:: python
    
    from reliability.Distributions import DSZI_Model, Weibull_Distribution
    from reliability.Probability_plotting import plot_points
    import matplotlib.pyplot as plt
    from reliability.Fitters import Fit_Weibull_DSZI
    
    model = DSZI_Model(distribution=Weibull_Distribution(alpha=1200,beta=3),DS=0.7,ZI=0.2)
    failures, right_censored = model.random_samples(100,seed=5,right_censored_time=3000)
    
    plt.subplot(121)
    fit = Fit_Weibull_DSZI(failures=failures,right_censored=right_censored,label='fitted Weibull_DSZI')
    model.CDF(label='true model')
    plt.legend()
    
    plt.subplot(122)
    fit.distribution.CDF(label='fitted Weibull_DSZI')
    model.CDF(label='true model')
    plot_points(failures=failures,right_censored=right_censored)
    plt.legend()
    plt.tight_layout()
    plt.show()

    '''
    Results from Fit_Weibull_DSZI (95% CI):
    Analysis method: Maximum Likelihood Estimation (MLE)
    Optimizer: TNC
    Failures / Right censored: 70/30 (30% right censored) 
    
    Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
        Alpha         1170.12         68.0933   1043.99   1311.49
         Beta         2.60255        0.299069   2.07771   3.25997
           DS        0.700005        0.045826  0.603391  0.781602
           ZI            0.22       0.0414247  0.149465  0.311627 
    
    Goodness of fit    Value
     Log-likelihood -463.613
               AICc  935.647
                BIC  945.646
                 AD  166.025 
    '''

.. image:: images/DSZI_example6.png

The DSZI model is a model of my own making. It combines the well established DS and ZI models together for the first time to enable heavily right censored data to be modelled using a DS distribution while also allowing for zero inflation of the failures.